Ranking Web Pages Using Collective Knowledge
نویسندگان
چکیده
Indexing is a crucial technique for dealing with the massive amount of data present on the web. Indexing can be performed based on words or on phrases. Our approach aims to efficiently index web documents by employing a hybrid technique in which web documents are indexed in such a way that knowledge available in the Wikipedia and in meta-content is efficiently used. Our preliminary experiments on the TREC dataset have shown that our indexing scheme is a robust and efficient method for both indexing and for retrieving relevant web pages. We ranked term queries in different ways, depending if they were found in Wikipedia pages or not. This paper presents our preliminary algorithm and experiments for the ad-hoc and diversity tasks of the TREC 2011 Web track. We ran our system on the subset B (50 million web documents) from the ClueWeb09 dataset. Categories and Subject Description Web Information Retrieval: Content Analysis, Indexing, and Ranking
منابع مشابه
A New Hybrid Method for Web Pages Ranking in Search Engines
There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...
متن کاملEfficient Methodologies to Handle Hanging Pages Using Virtual Node
In this paper we first explain the Knowledge Extraction (KE) process from World Wide Web (WWW) using Search engines. Then we explore the PageRank algorithm of Google Search engine (one of the famous link based search engine) with its hidden Markov analysis. In that we also explore one of the problems of Link based ranking algorithms called hanging pages or dangling pages (pages without any forw...
متن کاملWeb pages ranking algorithm based on reinforcement learning and user feedback
The main challenge of a search engine is ranking web documents to provide the best response to a user`s query. Despite the huge number of the extracted results for user`s query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance. In this paper, a ranking algorithm based on the reinforcement le...
متن کاملModeling and Leveraging Social Collective Intelligence
The rise of social interactions on the Web requires developing new methods of information organization and discovery. To that end, we propose a generative community-based probabilistic tagging model that can automatically uncover communities of users and their associated tags. We experimentally validate the quality of the discovered communities over the social bookmarking system Delicious. In c...
متن کاملA semantic self-organising webpage-ranking algorithm using computational geometry across different knowledge domains
In this paper we introduce a method for Web page-ranking, based on computational geometry to evaluate and test by examples, order relationships among web pages belonging to different knowledge domains. The goal is, through an organising procedure, to learn from these examples a real-valued ranking function that induces ranking via a convexity feature. We consider the problem of self-organising ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011